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1 Introduction 


Accounting for the fact that inverse problems are widely used in many helds of science, 
there has been over the last decades a growing interest in statistical inverse problems 
(see, e.g., Korostelev and Tsybakov [1993], Mair and Ruymgaart [1996], Evans and Stark 
[2002], Kaipio and Somersalo [2005], Bissantz et ah [2007] and references therein). Math¬ 
ematical statistics has paid special attention to oracle or minimax optimal nonparametric 
estimation and adaptation in the framework of inverse problems (see Efromovich and 
Koltchinskii [2001], Cavalier et ah [2003], Cavalier [2008] and Hoffmann and Reih [2008], 
to name but a few). Nonparametric estimation in general requires to choose a tuning 
parameter which is challenging in practise. Oracle and minimax estimation is achieved, 
respectively, if the tuning parameter is set to an optimal value which relies either on a 
knowledge of the unknown parameter of interest or of certain characteristics of it (such 
as smoothness). Since both the parameter and its smoothness are unknown, it is nec¬ 
essary to design a feasible procedure to select the tuning parameter that adapts to the 
unknown underlying function or to its regularity and achieves the oracle or minimax 
rate. Among the most prominent approaches stand without doubts model selection (cf. 
Barron et al. [1999] and its exhaustive discussion in Massart [2007]), Stein’s unbiased 
risk estimation and its extensions (cf. Cavalier et al. [2002], Cavalier et al. [2002] or 
Cavalier and Hengartner [2005]), Lepski’s method (see, e.g., Lepskij [1990], Birge [2001], 
Efromovich and Koltchinskii [2001] or Mathe [2006]) or combinations of the aforemen¬ 
tioned strategies (cf. Goldenshluger and Lepski [2011] and Comte and Johannes [2012]). 
On the other hand side, it seems natural to adopt a Bayesian point of view where the 
tuning parameter can be endowed with a prior. As the theory for a general inverse 
problem - with a possibly unknown or noisy operator - is technically highly involved, 
we consider in this paper as a starting point an indirect Gaussian regression which is 
well known to be equivalent to an indirect Gaussian sequence space model (in a Le Gam 
[1964] sense, see, e.g.. Brown and Low [1996] for the direct case and Meister [2011] for 
the indirect case). 

Let G be the Hilbert space of square summable real valued sequences endowed with the 
usual inner product {•, •)f 2 ^'^cl associated norm ]|-]|£ 2 - In an indirect Gaussian sequence 
space model (iGSSM) one aim is to recover a parameter sequence 9 = {9j)j^^ G G from 
a transformed version {)\j9j)j^i that is blurred by a Gaussian white noise. Precisely, an 
observable sequence of random variables (Y)j^i, Y for short, obeys an indirect Gaussian 
sequence space model, if 

j ^ (1-1) 
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where are unobservable error terms, which are independent and standard nor¬ 
mally distributed, and 0 < £ < 1 is the noise level. The sequence A = represents 

the operator that transforms the signal 9. In the particular case of a constant sequence 
A the sequence space model is called direct while it is called an indirect sequence space 
model if the sequence A tends to zero. We assume throughout the paper that the se¬ 
quence is bounded. 

In this paper we adopt a Bayesian approach, where the parameter sequence of interest 
9 = {9j)j^i itself is a realisation of a random variable 'd = {'dj)j^i and the observable 
random variable Y = (Yj)j^i satisfies 

~( 1 - 2 ) 

with independent and standard normally distributed error terms and noise level 

0 < £ < 1. Throughout the paper we assume that random parameters and 

the error terms are independent. Consequently, (1.2) and a specification of the 

prior distribution of ■J? determine completely the joint distribution of Y and 'j 9. For 
a broader overview on Bayesian procedures we refer the reader to the monograph by 
Robert [2007]. 

Typical prior specifications studied in the direct sequence space model literature are 
compound priors, also known as Sieve priors (see, e.g., Zhao [2000], Shen and Wasser- 
man [2001] or Arbel et al. [2013], Gaussian series priors (cf. Freedman [1999], Cox [1993] 
or Castillo [2008]), block priors (cf. Gao and Zhou [2014]), countable mixture of normal 
priors (cf. Belitser and Ghosal [2003]) and finite mixtures of normal and Dirac priors 
{e.g. Abramovich et al. [1998]). In the context of an iCSSM, Knapik et ah [2011] and 
Knapik et ah [2014] consider Gaussian series priors and continuous mixture of Gaussian 
series priors, respectively. 

By considering an iGSSM we derive in this paper theoretical properties of a Bayes proce¬ 
dure with a Sieve prior specification from a frequentist point of view, meaning that there 
exists a true parameter value 9° = {9°)j^i associated with the data generating process 
of A broader overview of frequentist asymptotic properties of nonparametric 

Bayes procedures can be found, for example, in Ghosh and Ramamoorthi [2003], while 
direct and indirect models, respectively, are considered by e.g., Zhao [2000], Belitser 
and Ghosal [2003], Castillo [2008] and Gao and Zhou [2014], and, e.g., Knapik et al. 
[2011] and Knapik et al. [2014]. Bayesian procedures in the context of slightly different 
Gaussian inverse problems and their asymptotic properties are studied in, e.g., Agapiou 
et al. [2013] and Florens and Simoni [2014]. However, our special attention is given to 
posterior consistency and optimal posterior concentration in an oracle or minimax sense, 
which we elaborate in the following. 
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In this paper we consider a sieve prior family where the prior distribution 

of the random parameter sequence is Gaussian and degenerated for all 

j > m. More precisely, the first m coordinates are independent and normally 

distributed random variables while the remaining coordinates are degenerated 

at a point. Note that the dimension parameter m plays the role of a tuning parame¬ 
ter. Assuming an observation Y = 0^j)j^i satisfying = ■dj' +y/e^j, we denote by 
IY the corresponding posterior distribution of i?™ given Y. Given a prior sub-family 
in dependence of the noise level e, our objective is the study of frequentist 
properties of the associated posterior sub-family \ Y}ms- To be more precise, let 

6° be the realization of the random parameter i? associated with the data-generating 
distribution and denote by E^o the corresponding expectation. A quantity <I>e which 
is up to a constant a lower and an upper bound of the concentration of the posterior 
sub-family {P^rn.^yjm,, i-e., 

limE 0 oF^m.|Y((A:)“^$£ ^ ^ A:<1>£) = 1 with 1 ^ AT < cx), (1.3) 

is called exact posterior concentration (see, e.g., Barron et ah [1999], Ghosal et ah [2000] 
or Gastillo [2008] for a broader discussion of the concept of posterior concentration). We 
shall emphasise that the derivation of the posterior concentration relies strongly on tail 
bounds for non-central distributions established in Birge [2001]. Moreover, if 0 
as £ —?• 0 then the lower and upper bound given in (1.3) establish posterior consistency 
and $£ is called exact posterior concentration rate. Obviously, the exact rate depends 
on the prior sub-family as well as on the unknown parameter 6°. 

In the spirit of a frequentist oracle approach, given a parameter 6° we derive in this 
paper a prior sub-family {F^^g with smallest possible exact posterior concentration 
rate which we call, respectively, an oracle prior sub-family and an oracle posterior 
concentration rate. On the other hand side, following a minimax approach, Johannes 
and Schwarz [2013], for example, derive the minimax rate of convergence <I>J of the 
maximal mean integrated squared error (MISE) over a given class ©n of parameters 
(introduced below). We construct a sub-family {P^m*}mt of prior distributions with 
exact posterior concentration rate <I>J uniformly over ©^ which does not depend on the 
true parameter 6° but only on the set of possible parameters ©n. It is interesting to note 
that in a direct GSSM Gastillo [2008] establishes up to a constant the minimax-rate as 
an upper bound of the posterior concentration, while the derived lower bound features a 
logarithmic factor compared to the minimax rate. Arbel et al. [2013], for example, in a 
direct GSSM and Knapik et al. [2014] in an indirect GSSM provide only upper bounds of 
the posterior concentration rate which differ up to a logarithmic factor from the minimax 
rate. We shall emphasize, that the prior specifications we propose in this paper lead to 
exact posterior concentration rates that are optimal in an oracle or minimax sense over 
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certain classes of parameters not only in the direct model but also in the more general 
indirect model. However, both oracle and minimax sieve prior are unfeasible in practise 
since they rely on the knowledge of either 6° itself or its smoothness. 

Our main contribution in this paper is the construction of a hierarchical prior P^m 
that is adaptive. Meaning that, given a parameter 6° E £2 or a classes 0 q C £2 of 
parameters, the posterior distribution P^m | y contracts, respectively, at the oracle rate 
or the minimax rate over ©„ while the hierarchical prior P^m does not rely neither on the 
knowledge of 6° nor the class ©„. Let us briefly elaborate on the hierarchical structure 
of the prior which induces an additional prior on the tuning parameter m, i.e., m itself is 
a realisation of a random variable M. We construct a prior for M such that the marginal 
posterior for (obtained by integrating out M with respect to its posterior) contracts 
exactly at the oracle concentration rate. This is possible for every 6° whose components 
differ from the components of the prior mean inhnitely many times. In addition, for 
every 6° in the class ©„ we show that the posterior distribution P^m | y contracts at least 
at the minimax rate and that the corresponding Bayes estimate is minimax-optimal. 
Thereby, the proposed Bayesian procedure is minimax adaptive over the class ©a. 

Although adaptation has attracted remarkable interest in the frequentist literature, 
only few contributions are available in the Bayesian literature on Gaussian sequence 
space models. In a direct model Belitser and Ghosal [2003], Szabo et ah [2013], Arbel 
et ah [2013] and Gao and Zhou [2014] derive Bayesian methods that achieve minimax 
adaptation while in an indirect Gaussian sequence space model, to the best of our knowl¬ 
edge, only Knapik et ah [2014] has derived an adaptive Bayesian procedure. In this 
paper, we extend previous results on adaptation obtained through sieve priors to the 
indirect Gaussian sequence space model. This requires a specihcation of the prior on the 
tuning parameter M different from the one used by, e.g., Zhao [2000] and Arbel et ah 
[2013]. Interestingly, our novel prior specihcation on M improves the general results of 
Arbel et ah [2013] since it allows to obtain adaptation without a rate loss (given by 
a logarithmic factor) even in the direct model. Gompared to Knapik et ah [2014] our 
procedure relies on a sieve prior while they use a family of Gaussian prior for •& that 
is not degenerate in any component of and where the hyper-parameter is represented 
by the smoothness of the prior variance. Their procedure is minimax-adaptive up to a 
logarithmic deterioration of the minimax rate on certain smoothness classes for 9° which 
is, instead, avoided by our procedure. 

The rest of the paper is organised as follows. The prior scheme is specihed in Section 
2. In Section 3 we derive the lower and upper bound of the posterior concentration, 
the oracle posterior concentration rate and the minimax rate. In Section 4 we introduce 
a prior distribution Pm for the random dimension M and we prove adaptation of the 
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hierarchical Bayes procedure. The proofs are given in the appendix. 


2 Basic model assumptions 


Let us consider a Gaussian prior distribution for the parameter ■j? = {'dj)j^i, that is, 
are independent, normally distributed with prior means and prior vari¬ 
ances Standard calculus shows that the posterior distribution of i? given Y = 

(Yj)j^i is Gaussian, that is, given Y, are conditionally independent, normally 

distributed random variables with posterior variance cxj := Var('i9j | Y) = 
and posterior mean 6j := E[dj \ Y] = -|- Xje~^Yj), for all j G N. Taking this 

as a starting point, we construct a sequence of hierarchical Sieve prior distributions. To 
be more precise, let us denote by Sx fhe Dirac measure in the point x. Given m G N, we 
consider the independent random variables with marginal distributions 




1 < j ^ m and 




(50X, m< j, 


( 2 . 1 ) 


resulting in the degenerate prior distribution P,?™. Here, we use the notation i?™ = 
Gonsequently, are conditionally independent given Y and their pos¬ 

terior distribution P^™ | y is Gaussian with mean 6j and variance aj for 1 ^ j ^ m while 
being degenerate on 9^ for j > m. 

Let 1a denote the indicator function which takes the value one if the condition A holds 
true, and the value zero otherwise. We consider the posterior mean 9^ = ■ = 

E['j 9™'|Y] given for j ^ 1 by 0™ := 9j l{j ^m} l{j>m} as Bayes estimator of 9. 
We shall emphasize an improper specification of the prior, that is, 9^ = = 0 

and Obviously, in this situation 9^^ = Y/A = and 

a = ej}? = are the posterior mean and variance sequences, respectively. 

Gonsequently, under the improper prior specification, for each m G N the posterior 
mean 6*™ = E['i9”^|Y] of corresponds to an orthogonal projection estimator, i.e., 

Qm ^ ^ /X, l{i ^ ^ 

From a Bayesian point of view the thresholding parameter m is a hyper-parameter and 
hence, we may complete the prior specification by introducing a prior distribution on 
it. Gonsider a random thresholding parameter M taking its values in {1,... ,G£} for 
some Gg G N with prior distribution Pm- Both and Pm will be specified in Section 4. 
Moreover, the distribution of the random variables {Yj}^.^^ and conditionally 

on M are determined by 


Yj — Xj 1 ? Ay/s^j and — 9 ^ y/^jVj l{i ^ ^ m} 
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where 'r]j}j^i are iid. standard normal random variables independent of M. Fnrther- 
more, the posterior mean 6 := E['i9^ | Y] satisfies 6j = 6j for j > and 6j = 6j P{1 ^ 
M < j| Y) + P{j ^ M ^ Gel Y) for all 1 ^ j ^ It is important to note, that the 
marginal posterior distribution P^M^y of given the observation Y does 

depend on the prior specification and the observation only, and hence it is fully data- 
driven. Revisiting the improper prior specification introduced above, the data-driven 
Bayes estimator equals a shrunk orthogonal projection estimator. More precisely, we 
have 6j = P{j ^ M ^ G^IY) x Yj/Aj l{i ^ ^ Interestingly, rather than using the 

data to select the dimension parameter m in the set of possible values {1 ,..., G^}, the 
Bayes estimator uses all components, up to G^, shrunk by a weight decreasing with the 
index. 


3 Optimal concentration rate 


3.1 Consistency 

Note that conditional on Y the random variables {'0J' —0°}^^ are independent and 
normally distributed with conditional mean 6j — 9° and conditional variance aj. The 
next assertion presents a version of tail bounds for sums of independent squared Gaussian 
random variables. It is shown in the appendix using a result due to Birge [2001] which 
can be shown along the lines of the proof of Lemma 1 in Laurent et ah [2012]. 

Lemma 3.1. Let be independent and normally distributed r.v. with mean aj G M 

and standard deviation (3j ^ 0, j G N. For m G N set '■= consider 

Vm ^ Er=i tm ^ maxi^j^rnfdj and ^ E7=i a'j. Then for all 0 we have 


sup exp ( 

c(c A 1) (^Vm '' 

)P{Sm 

- MS'™ ^ -c{vm + 2rm)) ^ 1; 

(3.1) 

\ 4:tm ' 

sup exp ( 

c(c A 1) (t^rn “1“ ^ 

)P{Sm 

3c 

- ESm > —{Vm + 2r^)) ^ 1. 

(3.2) 



A major step towards establishing a concentration rate of the posterior distribution 
consists in finding a finite sample bound for a fixed m G N. We express these bounds in 
terms of 






(Tj with aj = ^ + ^j ^ 


j>m j=l 

m m 

j=i i=i 


o\2 

ji ■ 
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Proposition 3.2. For all m eN, for all e > 0 and for all 0 < c < 1/5 we have 

E 0 oP.^mI > bm + 3mam + 3mcr(m)/2 + 4rm) ^ 2exp(—m/36); (3.3) 
E 0 oP^m|Y(|| 79 ”*- 6 '°||| < bm + ma^-4c{ma^rn) +tm)) ^ 2exp(-c^m/2). (3.4) 

The desired convergence to zero of all the aforementioned sequences necessitates to 
consider an appropriate sub-family in dependence of the noise level e, notably 

introducing consequently sub-sequences (cr(m^))m^;si and (r^Jm^^si- 

Assumption A.l. There exist constants 0 < So '■= eo{0°, X,9^,(^) < 1 and 1 ^ K := 
K{6°, X,6^,<;) < oo such that the Sieve sub-family of prior distributions sat¬ 

isfies the condition supo<£<£^(r^^ V mea^rn,))/{byn, V nieam,) ^ K. 

The following corollary can be immediately deduced from Proposition 3.2 and we omit 
its proof. 

Corollary 3.3. Under Assumption A.l for all 0 < e < So and 0 < c < 1/(8A) hold 

EeoP^m,lYi\\'d^^-e°\\l > (4+(ll/2)A)[b^, Vm.cf^J) ^2exp(-^); (3.5) 

E^oP^me |y(|| 79 ™^< (1 - 8 c A)[b,„^ V mea^J) ^ 2exp{-c^mj2). (3.6) 

Note that the sequence {bm^Vrn^amjme^i generally does not converge to zero. However, 
supposing that m^ —>■ oo as £ —?• 0 then it follows from the dominated convergence 
theorem that b^^ = o(l)- Hence, assuming additionally that = o(l) holds true is 

sufficient to ensure that (b^^ converges to zero and it is indeed a posterior 

concentration rate. The next assertion summarises this result and we omit its elementary 
proof. 

Proposition 3.4 (Posterior consistency). Let Assumption A.l be satisfied. Ifm^ —)■ oo 
and meCfrng = o(l) as e —)■ 0 , then 

liniEeoP^™^ |Y((10A)"^[bm^ V ^ -(^°\\% ^ 10A[bm^ V rn^amS) = 1- 

The last assertion shows that (b^^ V rneame)m^^i is up to a constant a lower and upper 
bound of the concentration rate associated with the Sieve sub-family {P,j m£ }m, of prior 
distributions. It is easily shown that it also provides an upper bound of the frequentist 
risk of the associated Bayes estimator. 

Proposition 3.5 (Bayes estimator consistency). Let the assumptions of Proposition 
3.4 be satisfied. Consider the Bayes estimator 0”^® := E['i9™"" | Y] then 

Eeo\\d^^ - 9°\\l ^ (2 + K)[bm^ V m.a^J 

and consequently E 0 o|| 6 '"*'= — = o(l) as e —)■ 0 . 
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The previous results are obtained under Assumption A.l. However, it may be difficult 
to verify whether a given sub-family of priors satisfies such an assumption. 

Therefore, we now introduce an assumption which states a more precise requirement on 
the prior variance and that can be more easily verified. Define for j, m G N 

m 

Aj := A“^, A(m) := max Aj, Am ■= rn~^ Aj and := [bm V emAm]- 

Assumption A.2. Let := max{l ^ m ^ • ^^(m) ^ Ai}. There exists a finite 

constant d > 0 such that c^j ^ d[e^PA^^‘^ V sAfi for all 1 ^ j ^ and for all e G (0,1). 

Note that in the last Assumption the defining set of Ge is not empty, since £A(i) ^ Ai 
for all £ ^ 1. Moreover, under Assumption A.2, by some elementary algebra, it is readily 
verified for all 1 ^ j ^ G^ that 

1 ^ ^Aj/aj < (1 -I- l/d) and o-j/qj < (1 A d~^e^^‘^Ay'^) 

which in turn implies for all 1 ^ m ^ that 

^ d~‘^\\6'' -6°\\j^e Ai^m), 1 ^ e m Ai^m){ma(m))~^ and 1 ^ emAm{mam)~^ ^ {1+1/d). 

We will use these elementary bounds in the sequel without further reference. Returning 
to the Sieve sub-family of prior distributions, if in addition to Assumption 

A.2 there exists a constant 1 ^ L := L{6°, X,6^) < oo such that 

sup ^ L (3.7) 

0<e<l 

and = o(l) as £ —)■ 0 hold true, then the sub-family {P-a^s}m^ satisfies Assumption 
A.l with K := ((1 -|- d~^) V d~'^\\6° — 6^\\‘j^)L. Indeed, if = o(l) and, hence 
^ Ai/L for all e G (0,eo), then ^ G^ holds true for all e G (0,£o) since 
erUgAi ^ emi;A(^me) ^ ^ Ai and thus rUg ^ and £A(m^) ^ Ai. In other 

words, for all £ G (0, Eo) we can apply Assumption A.2 and the claim follows taking into 
account the aforementioned elementary bounds. Note further that the constant K does 
not depend on the prior variances but only on the constant d given by Assumption 
A.2. The next assertion follows immediately from Corollary 3.3 and we omit its proof. 

Corollary 3.6. Under Assumption A.2 consider a sub-family jme such that (3.7) 
and = o(l) as £ —>■ 0 are satisfied, then there exists Eo G (0,1) such that for all 
0 < E < Eo and 0 < c < 1/(8^) with K = {{1 + d~^) V d~‘^\\6° — 9^\\\)L hold 

771 

-9%^ > (4 + (ll/2)P)<h-^) ^ 2exp(-^); (3.8) 

E0oP^-e|Y(||^™" <{l-8cK){l + d-^)-^<l>'fi^) ^2exp{-c^me/2). (3.9) 

The result implies consistency if —>■ oo as £ —)■ 0 but it does not answer the question 

of an optimal rate in a satisfactory way. 
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3.2 Oracle concentration rate 

Considering the Sieve family of prior distributions, the sequence 

provides up to constants a lower and upper bound for the posterior concentration rate 
for each sub-family satisfying the conditions of Corollary 3.6. Observe that the 

term hm^ and hence the rate depends on the parameter of interest 9°. Let us minimise 
the rate for each 9° separately. For a sequence with minimal value in A we set 

arg min^g^ {dm} ■= min {m : am ^ Ofe, Vfc G A} and dehne for all £ > 0 

m° := m°{9°, 9^, A) := arg min {<F™} and 

m^l 

:= ^°,{9°,9^,X) 

We may emphasise that = o(l) as £ —)■ 0. Indeed, for all 5 > 0 there exists a dimension 
ms and a noise level es such that d*? ^ [bm^ ms ^ 5 for all 0 < £ ^ es- Obviously, 
given 9° & Q the rate is a lower bound for all posterior concentration rates 
associated with a prior sub-family satisfying the conditions of Corollary 3.6. 

Moreover, the next assertion establishes <F? up to constants as upper and lower bound 
for the concentration rate associated with the sub-family {P^m° }m° ■ Consequently, 
is called oracle posterior concentration rate and {P^m.° }m° oracle prior sub-family. The 
assertion follows again from Corollary 3.3 (with c = l/(9iF)) and we omit its proof. 

Theorem 3.7 (Oracle posterior concentration rate). Suppose that Assumption A.2 holds 
true and that there exists a constant 1 ^ L° := L°{9°, X,9^) < oo such that 

snp eme ^ L°. (3.11) 

0 <e<l 

If in addition mi —?• oo as £ —>■ 0 and K° := 10((1 -|- d~^) V d~‘^\\9° — 9^ , then 

lirnEeoP^mO |Y((i^°)“^<h? ^ ^ iC°d>?) = 1. 

Note that m? —)■ oo as £ —?■ 0 if and only if b^ > 0 for all m ^ 1. Roughly speaking, 
the last assertion establishes <F? as oracle posterior concentration rate for all parameter 
of interest 9° with components differing from the components of the prior mean 9^ 
inhnitely many times. However, we do not need this additional assumption to prove 
the next assertion which establishes as oracle rate for the family {9'^}m of Bayes 
estimator and that 9^^^ is an oracle Bayes estimator. 

Theorem 3.8 (Oracle Bayes estimator). Consider the family {9^}m of Bayes estima¬ 
tors. Under Assumption A.2 we have (i) E 0 o|| 6 *"*s — 9°\\j^ ^ (2 + d~‘^\\9° — and 

(ii) mfm^iEgo\\9'^ - 9°\\j^ ^ ( 1 -h l/(i)“^<F? for all e e ( 0 ,eci)- 


;= = mind)!" . (3.10) 

_'-..I ^ ^ 
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Note that, the oracle choice mi depends on the parameter of interest 9° and thus the 
oracle Bayes estimator 6 ^° as well as the associated oracle sub-family {P^m° }m° of prior 
distributions are generally not feasible. 

3.3 Minimax concentration rate 

In the spirit of a minimax theory we are interested in the following in a uniform rate over a 
class of parameters rather than optimising the rate for each 6 ° separately. Given a strictly 
positive and non-increasing sequence a = with Oi = 1 and limj_^oo cij = 0 consider 

for 9 E £2 its weighted norm ;= /‘^j- define £2 as the completion of £2 

with respect to 11-11^^. In order to formulate the optimality of the posterior concentration 
rate let us define 

mt := m*(a, A) := arg min {om V ernAm] and 

<I>J := <I>j(a, A) := [amt V emj A^*] for all £ > 0. (3.12) 

We remark that = o(l) and —)■ 00 as £ —)■ 0 since a is strictly positive and 
tends monotonically to zero. We assume in the following that the parameter 9° belongs 
to the ellipsoid 0^ := {9 E £2 ■ — 9^\\'l ^ r} and therefore, bm{9°) ^ am^. Note that 

= mmmt?i[bm\/£m Am] ^ {IVr) mmmt,i[am'^£m Am] = (lVr)<I)^ and ^ r, 

and hence from Theorem 3.8 it follows Eeo||6'™'° — 9°\\‘j^ ^ (2 -|- r/d‘^){l V On the 

other hand side, given an estimator 9 of 9 let sup^ger — 911^^ denote the maximal 
mean integrated squared error over the class 0„. It has been shown in Johannes and 
Schwarz [2013] that <I>J provides up to a constant a lower bound for the maximal MISE 
over the class 0[[ (assuming a prior mean 9^ = 0) if the next assumption is satisfied. 
Assumption A.3. Let a and A be sequences such that 

0 < K* := K*(a, A) := inf {(<I)j)“^[amj A emj A^j]} ^ 1. (3.13) 

We may emphasise that under Assumption A.3 the rate <l>e = <I)j(a, A) is optimal in a 
minimax sense and the Bayes estimate 9'^° attains the minimax rate up to a constant. 
However, the dimension parameter mi depends still on the parameter of interest 9°. 
Therefore, let us consider the Bayes estimate 9"^* and the sub-family }mj of prior 

distributions which do not depend anymore on the parameter of interest 9° but only 
on the set of possible parameters 0^ characterised by the weight sequence a. The next 
assertion can be shown along the lines of the proof of Theorem 3.8, and, hence we omit 
its proof. 
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Theorem 3.9 (Minimax optimal Bayes estimator). Let Assumption A.2 be satisfied. 
Considering the Bayes estimator := | Y] we have 

sup E6 io||6 '”^^ — ^ (2 + r/d^){l V r)<l>J for all e G (0,eo)- 

6>°e0S 

The last assertion establishes the minimax optimality of the Bayes estimate 6^^ over 
the class 0^. Moreover, the minimax rate <he provides up to a constant a lower and an 
upper bound for the posterior concentration rate associated with the prior sub-family 
{P^m* }m|, which is summarised in the next assertion. 

Theorem 3.10 (Minimax optimal posterior concentration rate). Let Assumption A.2 
and A.3 hold true. If there exists a constant 1 ^ L* := L*{a, X) < oo such that 

sup ^ T* (3.14) 

0<e<£o 

and K* := i^*(r, a, A, d, n) := 10((1 + l/d) V r/(i^)(l V r){L*/tp), then 

lim inf E,oP j ^ -0^1 ^ P*Tj) = 1. 

Comparing the last result with the result of Theorem 3.7 and keeping in mind that 
(1 V r)<l)J ^ the posterior concentration rate associated with the prior sub-family 
{P^m* }mt is of order of the minimax rate Tj uniformly for all parameter of interest 
6° G 0„. However, for certain parameter 6° the minimax rate <1>J may be far slower than 
the oracle rate $?. For example, as shown in case [P-P] in the following illustration 
the minimax rate is of order ) while it is not hard to see, that for all 

parameter 9° with bm exp(—m^^) the oracle rate is of order 0{e\ log£|(^“+^iA2p)^ ("ggg 
case [E-P]). Moreover, the optimal choice mt of the dimension parameter still depends 
on the class 0^, which might be unknown in practise, therefore we will consider in the 
next section a fully data-driven choice using a hierarchical specification of the prior 
distribution. 

Illustration 1. ITe illustrate the last assumptions and the minimax rate for typical 
choices of the sequences a and A. For two strictly positive sequences (%);>! and (&j)j>i 
we write aj x bj, if {a-j/bj)j^i is bounded away from 0 and infinity. 

[P-P] Consider aj x and Xj x with p > 0 and a > 0 then mt x £-i/( 2 p+ 2 a-i-i) 
and x £ 2 p/( 2 a-i- 2 p-i-i)^ 

[E-P] Consider aj x exp(—-|- 1) and Xj x with p > 0 and a > 0 then mt x 
I log e — (log I log e I) I and x e | log e \ (2“+i)/(2p) _ 
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[P-E] Consider aj x j and A| x exp(—+ 1), with p > 0 and a > 0 then mt x 
I log£ — I and x |log£|“^/“. 

In all three cases Assumption A.3 and (3.14) hold true. □ 


4 Data-driven Bayesian estimation 


We will derive in this section a concentration rate given the aforementioned hierarchical 
prior distribution. For this purpose we impose additional conditions on the behaviour 
of the sequence A = (Aj)^.^^. 

Assumption A.4. There exist finite constants Ca ^ 1 and L\ ^ 1 such that for all 
k,l eN hold (i) maxj>fcA2 ^ Ca mini^^^fc A^ = CxAf^y (ii) A^^i) ^ A(fc)A(p; (hi) 1 ^ 
A(k)/Ak ^ Lx- 

We may emphasise that Assumption A.4 (i) holds trivially with Ca = 1 if the sequence 
A is monotonically decreasing. Moreover, considering the typical choices of the sequence 
A presented in Illustration 1, Assumption A.4 (ii) and (iii) hold only true in case of a 
polynomial decay, i.e., [P-P] and [E-P]. In other words. Assumption A.4 excludes an 
exponential decay of A, i.e., [P-E]. 

Assumption A.5. Let 9^, 9° and A he sequences such that 


:= tC{9^,9fiX) 


inf {(4'?) AemlAmi]] ^ 1. 

^£*C.£o 


(4.1) 


Observe that bm° ^ > 0 due to Assumption A.5 which in turn implies b^ > 0 for 

all /c G N and, hence m? —)■ cxd as e —)■ 0. Indeed, if there exists K eN such that bi^ = 0 
and hx-i > 0 then there exists Eo G (0, 1) with EoKAk < ^k-i and for all e G (0,eo) 
it is easily seen that m° = K and hence bm° = 0. Moreover, due to Assumption 
A.4 (iii) there exists a constant Lx depending only on A such that em?A(m|)(*h?)~^ ^ 
A(m°)(Am|)~^ ^ Fa, i.e., condition (3.7) holds true uniformly for all parameters 9 G £ 2 - 
If we suppose in addition to Assumption A.4 and A.5 that the sequence of prior variances 
meets Assumption A.2 and that mi —)■ 00 as e —)■ 0, then the assumptions of Theorem 
3.7 are satisfied and 4>S provides up to a constant an upper and lower bound of the 
posterior concentration rate associated with the oracle prior sub-family {Pmi }m° ■ 

Let us specify the prior distribution Pm of the thresholding parameter M taking its 
values in {1,..., G^} with as in Assumption A.2, and for 1 ^ m ^ Gg 


Puim) 


exp{-3Gxm/2) 

exp(-3GAfc/2) 


(4.2) 


Keeping in mind the sequences 9'^ = and a = of conditional means and 

variances, respectively, given by 9j = aj{Xje~^Yj + <^^^9^) and aj = + A|e“^)“^, 
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for each m G N the sequence 6*™ = | Y] of posterior means of i?"* sat¬ 

isfies = 9j l{i ^ j sg m} +9j l{j > m}- Introducing further the weighted norm ||6*||^ : = 
9 ^ ^2 the posterior distribution Pm|y of the thresholding parameter M 

is given by 


VM\Yijn) = Pm|y(M = m) 


exp(-f{-||r^-g^||^ + 3C'Am}) 

Etiexp{-l{-\\9>^-9-\\l + 3C,k}) 


(4.3) 


Interestingly, the posterior distribution Pm|y of the thresholding parameter M is con¬ 
centrating around the oracle dimension parameter ml as e tends to zero. To be more 
precise, there exists So G (0,1) such that ml ^ for all e G (0,eo) since d>S = o(l) for 
£ —)■ 0. Let us further define for all e G (0, So) 


:= min {m G {1,..., m?} : bm ^ 8 LaCa( 1 + l/d)d>°} and 

G+ := max{m G {m?,...,GJ : m ^ 5LA(eA(m°))“^<h?} (4.4) 

where the defining sets are not empty under Assumption A.4 since 8LxG\{l + l/d)^l ^ 
8LaCa(1 + l/d)bm° ^ bmg and 5LA(eA(m°))~^*h? ^ 5ml ^ ml. Moreover, under As¬ 
sumption A.5 it is easily verified that G~ —?• oo as e —?• 0. 

Lemma 4.1. If Assumptions A.2 and A.f hold true then for all e G (0,eo) 

(i) E 0 oPm|y (1 ^ M < G") ^ 2 exp ( - logG^) ^ 2 exp ( - ^ml + logG^); 

(ii) E 0 oPm|y(G+ < M ^ Ge) ^ 2exp ( - ^ml -FlogGe) ^ 2exp(- ^ml -MogGe). 

Recall that m? —)■ cxd as £ —)■ 0 under Assumption A.5. If in addition m?/(IogG£) —)■ oo 
as £ —)■ 0 then Lemma 4.1 states that the posterior distribution of the thresholding 
parameter M is vanishing outside the set {Gf ,..., Gf } as e —)■ 0. On the other hand 
side, the posterior distribution P^m | y of associated with the hierarchical 

prior is a weighted mixture of the posterior distributions \ Y}m=i studied in section 
3, that is, P^M|y = X)m=iPM|Y(^)Pi 9 ’"|Y- The next assertion shows that considering 
posterior distributions {Pi?™ i associated with thresholding parameters belonging 

to (Gj,..., G+} only, then their concentration rate equals 4)? up to a constant. 

Lemma 4.2. If Assumptions A.2, A.4 and A.5 hold true then for all e G (0,£o) 

(i) -e°\\l > ^ 74exp(-G7/36); 

(ii) < (i^°)-'<f>0 ^4(P°)2exp(-G-/(P°)2), 

where K° := 10((1 + 1/d) V \\9° - 9^\\l/d^)Ll{8Gx{l + 1/d) V P°Apo)) with D° : = 
P°(r, 0 °,A) := (SLaKI. 
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From Lemma 4.1 and 4.2 we derive next npper and lower bonnds for the concentration 
rate of the posterior distribntion P^m | y by decomposing the weighted mixtnre into three 
parts with respect to and Gf which we bonnd separately. 

Theorem 4.3 (Oracle posterior concentration rate). Let Assumptions A.2, A.4 and A.5 
hold true. If in addition {\ogGe)/rni —?• 0 as £ —)■ 0, then 

limEeoP^M|Y((iF°)-i<F? ^ -0°\\l ^ = 1 

where K° is given in Lemma 4.2. 

We shall emphasise that the Bayes estimator 6 := := E['i9^ | Y] associated with 

the hierarchical prior and given by dj = 6^ for j > G^ and 6j = 6^ P{1 ^ M < 
j| Y) + ej P{j ^ M ^ Gel Y) for all 1 ^ j ^ Ge, does not take into acconnt any prior 
information related to the parameter of interest, and hence it is fnlly data-driven. The 
next assertion provides an npper bonnd of its MISE. 

Theorem 4.4 (Oracle optimal Bayes estimator). Under Assumptions A.2, A.4 and 
A. 5 consider the Bayes estimator 6 := E['j9'^|Y]. If in addition log(Ge/<F?)/m? —)■ 
0 as e —)■ 0, then there exists a constant K° := K°{6°,6^ ,X,d,L) < oo such that 
Eeo\\e-e°\\l ^ for all e e (0,eo). 

Both Theorems, 4.3 and 4.4 hold trne only nnder Assnmption A.5, which we have 
seen before imposes an additional restriction on the parameter of interest 6°, i.e., its 
components differ from the components of the prior mean 6^ inhnitely many times. 
However, for all parameters of interest satisfying Assnmption A.5, the hierarchical prior 
seqnence allows to recover the oracle posterior concentration rate and the fnlly data 
driven Bayes estimator attains the oracle rate. In the last part of this section we show 
that for all 6° G ©(j the posterior concentration rate and the MISE of the Bayes estimator 
associated with the hierarchical prior are bonnded from above by the minimax rate Tj np 
to a constant. In other words, the fnlly data-driven hierarchical prior and the associated 
Bayes estimator are minimax-rate optimal. 

Recall the dehnition (3.12) of mt and TL Consider the prior distribntion Pm of the 
thresholding parameter M, and observe that there exists e* snch that mt ^ G^ for all 
e G (0,e*) since 4>e = o(l) as e —)■ 0. Remark that £mjA(mj)(*hJ)“^ ^ A(m*)(Am*)“^ ^ 
Lx with Lx depending only on A dne to Assnmption A.4 (iii), i.e., condition (3.14) 
holds trne nniformly for all parameters 6* G £ 2 - If we assnme in addition that the 
seqnence of prior variances satishes Assnmption A.2 and that Assnmption A.3 holds trne, 
then the conditions of Theorem 3.10 are satished and provides np to a constant an 
npper and lower bonnd of the posterior concentration rate associated with the minimax 
prior snb-family {Pm|}mj- ©n the other hand side, the posterior distribntion Pm|y of 
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the thresholding parameter M is concentrating aronnd the minimax-optimal dimension 
parameter mt as £ tends to zero. To be more precise, for e G (0,£*) let us define 

G*" := min{m G {1,... ^ 8LxCx{l + l/d)(l V r)<hj} and 

G*+ := max {m G {mt ,..., : m ^ 5LA(£A(mj))“^(l V r)<hj } (4.5) 

where the dehning sets are not empty under Assumption A.4 since 8LaGa(1 + l/d)(l V 
r)$J ^ 8 LAGA(l-M/d)ramj ^ 8 LaGa (1 + l/d)bmj ^ b^j and 5LA(£A(m*))“^(l Vr)<h^ ^ 
5mt ^ mj. Moreover, it is again straightforward to see that G*“ —)■ oo as £ —)■ 0. 
Lemma 4.5. If Assumption A.2 and A.f hold true then for all 9° G ©„ and e G (0,£*) 

(i) E 0 oPm|y (1 ^ M < G*-) ^ 2exp ( - ^ logG^); 

(ii) E 0 oPm|y(G*+ < M ^ Ge) ^ 2exp ( - logG^). 

By employing Lemma 4.5 we show next for each 6° G ©„ that the minimax rate 
provides up to a constant an upper bound for the posterior concentration rate associated 
with the fully data-driven hierarchical prior distribution P^m. 

Theorem 4.6 (Minimax optimal posterior concentration rate). Let Assumption A.2, 
A.3 and A.f hold true. If in addition {\ogG^)/mt —)■ 0 as e —)■ 0, then 

(i) for all 9° G ©(j we have 

liniE,oP^M|Y(||79“-0°||2^ ^ P*<hj) = 1 

where K* := 16((1 -|- 1/d) V r/d‘^)L‘l{8Cx{l + 1/d) V P*A(o*))(l V r) with D* : = 
P*(a,A) := \hLx/K*); 

(ii) for any monotonically increasing and unbounded sequence (P£)e holds 
lim inf EeoP^M|Y(||i9“-0°||2 ^ = 1 . 

We shall emphasise that due to Theorem 4.3 for all 9° G ©„ satisfying Assumption 
A.5 the posterior concentration rate associated with the hierarchical prior attains the 
oracle rate which might be far smaller than the minimax-rate Consequently, 
the minimax rate cannot provide an uniform lower bound over ©^ for the posterior 
concentration rate associated with the hierarchical prior. However, due to Theorem 4.6 
the posterior concentration rate is for all 9° G ©^, independently that Assumption A.5 
holds, at least of the order of the minimax rate <hj. The next assertion establishes the 
minimax-rate optimality of the fully data-driven Bayes estimator. 
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Theorem 4.7 (Minimax optimal Bayes estimate). Under Assumption A.2, A.3 and A.4 
consider the Bayes estimator 9 := | Y]. If in addition log(Ge/4)j)/mJ —)■ 0 as £ —)■ 

0, then there exists K* := 77*(0^, A,(i) < oo such that supgog 0 r Ego || 6 ' — 9°\\j^ ^ K*^t 
for all s G (0, e*). 

Let us briefly comment on the last assertion by considering again the improper spec¬ 
ification of the prior family introduced in Section 2. Recall that in this sit¬ 

uation for each m G N the Bayes estimator 0"* = E['i9™'|Y] of i?™ equals an or¬ 
thogonal projection estimator, i.e., 9"^ = (Y/A)”^. Moreover, the posterior proba¬ 
bility of the thresholding parameter M taking a value m G {!,..., is propor¬ 
tional to exp(——||(Y/A)™'||g^ -|- 3C\m}), and hence the data-driven Bayes estimator 
9 = {9j^= E['j 9^ I Y] equals the shrinked orthogonal projection estimator given by 

y Ek.jexp(-H-||(y/A)'"|i;A + 3C>,m}) ^ 

' E°'.i<=xp(-i{-|l(F/A)”>|||^ + 3C,m}) Y, 

From Theorem 4.7 it follows now, that the fully data-driven shrinkage estimator 9 is 
minimax-optimal up to a constant for a wide variety of parameter spaces 0^ provided As¬ 
sumptions A.3 and A.4 hold true. Interestingly, identifying T( 6 ''") := —(1/2)||(Y/A)™'||^^ 
as a contrast and pen^ := 3/2C\m as a penalty term the j-th shrinkage weight is propor¬ 
tional to +P 6 nm})- Roughly speaking, in comparison to a classical 

model selection approach where a data-driven estimator = (Y/A)™ is obtained by 
selecting the dimension parameter rh as minimum of a penalised contrast criterion over a 
class of admissible models {!,... jGg}, i.e., m = arg min^,g^^(j^{T( 6 *™') +pen^}, follow¬ 
ing the Bayesian approach each of the components of the data-driven Bayes estimator 
is shrunk proportional to the associated values of the penalised contrast criterion. 

Conclusions and perspectives. In this paper we have presented a hierarchical prior 
leading to a fully-data driven Bayes estimator that is minimax-optimal in an indirect 
sequence space model. Obviously, the concentration rate based on a hierarchical prior 
in an indirect sequence space model with additional noise in the eigenvalues is only 
one amongst the many interesting questions for further research and we are currently 
exploring this topic. Moreover, inspired by the specific form of the fully-data driven 
Bayes estimator, as discussed in the last section, we are currently studying the effect of 
different choices for the contrast and the penalty term on the properties of the estimator. 
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A Appendix: Proofs of Section 3 

Proof of Lemma 3.1. Let Xj = /3jZj + aj with independent and standard normally 
distributed random variables {Zj}JLi- We start our proof with the observation that 
HSm) = EJi {Pj + and define := | Var {{l3jZj + ajf) = + 

2q!^). Let tm ■= maxi^j^m/3j and by using that Vm ^ and ^ 

have ]E(S'm) + and ^ tm (^m + 2rm). These bounds are used below without 

further reference. There exist several results of tail bound for sums of independent 
squared Gaussian random variables and we present next a version which is due to Birge 
[2001] and can be shown following the lines of the proof of Lemma 1 in Laurent et ah 
[2012]. For all a; > 0 we have 

P{Sm — ES'm ^ 2\/T^rnX + 2tmx) ^ exp(—o:) and 

P{Sm - IE5'm ^ -2v^S^) ^ exp(-a:). (A.l) 

Consider (3.2). Keeping in mind that for all c ^ 0, {3/2)c{vm + 2rm) ^ c(vm + 
2rm) + 2tmc{c A l)(nm + 2rm)/(4tm) and (c V l)tm{vm + 2rm) ^ Sm we conclude for 
X := c{c Al){vjn + 2r^)/{Atm) that {3/2)c{vm + 2r^) ^ 2a/S mX -\- 2tmX and hence by 
employing the first exponential bound in (A.l) we obtain (3.2). On the other hand side, 
since c{vm + 2rm) ^ 2\/S//r for all c ^ 0 assertion (3.1) follows by employing the second 
exponential bound in (A.l), which completes the proof. □ 

Proof of Proposition 3.2. We intend to apply the technical Lemma 3.1. Consider 
first the assertion (3.3). Let Sm and Ci be positive constants (to be specified below). 
Keeping in mind that the posterior distribution of i?™ given Yj is degenerated on 9^ for 
j > m and that bm = J2j>mi^j ~ ^j)'^ 

( 3c 

II^m _5)°||2^ > bm + Wm + -^P^O-(m) + (3Ci l)Sm 

= > mam + + (3ci + l)sm 

V i=i 

Define S'^ := where conditional on Y the random variables {i?™ —9°}jLi 

are independent and normally distributed with conditional mean 9j — 6° and conditional 
variance aj. Observe that mam = and |= mam + ~ 
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Introduce the event Qm ■= ^ -Sml where obviously | y[*S'^"*] ^ 

mam + Sm and hence, 


Ego > mam + ^ma^rn) + (3ci + 1)5^^ 

<E,o -^^-wiST] > ^{ma^m)+2sm)y 


Employing (3.2) in Lemma 3.1 we bound the left hand side in the last display and we 
obtain 


Ego P^m^y(^S^ > mam+^rna(^m) + {^ci+l)sm^ ^ exp(- 


ci(ci A l){ma(m) + 2sm)- 


4cr 


(m) 


where we used that ma^m) ^ ^(m) = niaxi^j^m c^j- As a consequence, 


E6loP.^m I y( 




3ci 

>hm + mam + -^nra(m) + (3ci + l)sm) 


^ exp(- 


Ci(ci A l){ma(^m) + 2s,, 


4a, 


(m) 


+ Peo{ni,). (A.2) 


In the following, we bound the remainder probability of the event Vt^ = > Sm} for 

■— ~ where the random variables {^J — 0°j}'^=i are independent and 

normally distributed with mean E5)o[6*J] — 9° and standard deviation f5j := for 

Hj := + 1)“^ Since aj = and fXj ^ 1 if follows that mam ^ 

and a(m) ^ rnaxi^j^mP]. Moreover, [^J] “ ^jY hence E6io[S'))(] ^ 

mam + ^m- Denote Sm ■= mam + ^ma^m) + (3c2 + l)rm which allows us to write 


Peoiytm) = Pe° > mam + ^ma(^m) + (3c2 + l)rm^ 

^ Peo > ^{ma(^m) + 2r,„)^ 

The right hand side in the last display is bounded by employing (3.2) in Lemma 3.1, 
and hence 


PeY^m) ^ exp(- 


C2(C2 A l)(ma(,„) + 2fo 


4a, 


(m) 


(A.3) 


By combination of (A.2), (A.3) and Sm = mam + 2 A (^*^2 + l)i^m h follows that 

3c 3c 

¥.eoP^rn\y{^\\-d^ -d°\\\ > +^0^^+-^ma(m) + (3ci+1)[ma^+—ma(m) + (3c2+1)t^]) 

ci(ci A 1)(3 c 2 + l)(ma(„,) + 2vm). , . 02(02 A l)(ma(m) + 2rm). 

^ exp(----) + exp(- ' 


4a, 


(m) 


4a, 


(m) 
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The assertion (3.3) follows now by taking ci = 1/3 = C 2 . The proof of the assertion 
(3.4) follows along the lines of the proof of (3.3). Let C 3 be a positive constant (to be 
specified below). Since | y[S')^""] ^ mam it trivially follows from (3.1) in Lemma 3.1 
that 


Ego I Y 



< mam 


- csma^rn) - 2 C 3 S 



^ E^O I Y —E^m|Y[S'^ ] < —C3(m(T(m) + 2Sm)^ 

C3(C3 A l){ma^m) + 2^^), 

^ exp(---) 

4cr(m) 

Combining the last bound, the estimate (A.3) and bm = J2j>mi^j ~ follows that 


EgoP^jm IY -6'°||^2 < bm+mam-C3ma(^m)-‘2c3[mam+^ma(^m) + {^C2+l)tm\^ 

^ E0O P^mI y(S'^™ < mam - c^mairn) - ^c^Sm) + Pe°{^m) 

/ C3(C3 A 1)(3c 2 + l)(ma(^) + 2Vm)^ ^ _^ 02(02 A l)(ma(,„) + 2rm) ^ 

^ expi^ ) + exp( j 

4cr (m) 4(7 (m) 

The assertion (3.4) follows now by taking 02 = 1/3 which completes the proof. □ 

Proof of Proposition 3.5. Keeping in mind the notations and findings used in the 
proof of Proposition 3.2 we have 


I ft's 

E,o||0-^ - 9°\\l = Ego J^{eJ - ( 0 ; - 6])^ 

j = l j>nie 

= (A.4) 

i=i 

which together with ( 7 jA^£“^ ^ 1 implies E 5 )o|| 6 '™’= — 6'°||^2 ^ bms+m^ameP''^ms- Exploiting 
the Assumption A.l, that is, Xm^ ^ K\^me V m^am^], we obtain the assertion. □ 

Proof of Theorem 3.8. The assertion follows from (A.4) given in the proof of Propo¬ 
sition 3.5. Indeed, (i) follows by combination of (A.4), ^ and 

Xm ^ d~^\\9 °while (A.4), ^ (l + l/d)"^£mA,„ and ^ 0 

imply together (ii). Note that these elementary bounds hold due to Assumption A .2 for 
all £ G ( 0 ,£o) since <I>? = o(l) as £ —)■ 0 , which completes the proof. □ 

Proof of Theorem 3.10. We start the proof with the observation that due to As¬ 
sumption A.3 and (3.14) the sub-family {P^mj satisfies the condition (3.7) uniformly 
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for all 6° G with L = L*/k*. Moreover, we have = o(l), as e —?• 0 and we suppose 
that Assumption A.2 holds true. Thereby, the assumptions of Corollary 3.6 are satished. 
From {{1 + 1/d)\Jr/cP){L*/K^) ^ {{1 + 1/d)\/= K and the dehnition of 
K* it follows further that iC* ^ (4+(ll/2)A')(l V r) and {K*)~^ ^ (l/9)(l + l/d)“^K* for 
all 6° e 0^. Moreover, for all 0 < £ < So we have (1 Vr) <F* ^ = [b^* V emt A^j] ^ 

K* By combining these elementary inequalities and Corollary 3.6 with c := 1/(9A') 
and c ^ 1/K* uniformly for all 6° G 0^ we obtain for all £ G (0,£o) 

sup E,oP * -9% > P*<FJ) 

0°e0S ' 

^ sup E,oP * (11^™? -9% > (4+ (11/2)P)C^) 

6»°ees ' 

^ 2exp{—mt/36); (A.5) 


sup^E,oP^™j -9°\\i < {KT^^t) 




^ sup E,oP * (||.9™^-0°||,^^<(l-8cP){(l + l/d)}-^$.^) 

6»°ees ' 

^ 2exp(-mj/[2(P*)2]). (A.6) 

By combining (A.5) and (A.6) we obtain the assertion of the theorem since mt -+ oo, 
which completes the proof. □ 

B Appendix: Proofs of Section 4 

B.l Proof of Theorem 4.3 

Proof of Lemma 4.1. Consider (i). The claim holds trivially true in case Gj = 1, 
thus suppose Gj > 1 and let 1 ^ m < Gj ^ m+ Dehne Sm ■= —9^ \\a- 

Given an event Am and its complement A'/^ (to be specihed below) it follows 


exp{l{\\9^ - 9^"^ 


PM\Y{m) = 


3Gxm}) 




= exp Q{ - + 3Gx[m° - m]}^ (B.l) 


Moreover, elementary algebra shows 


A?cTi 


■5'™= E 


j=m-\-l 
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where the random variables ^{Yj — ^jO^)}j^i are independent and normally 

distributed with standard deviation f3j = and mean aj = (3je~^^‘^Xj{6° — 6 j)- 

Keeping in mind the notations used in Lemma 3.1 define Vm '■= X]j=m+i '■ = 
Yl^=m+i^'j- W^e observe that Assumption A.2 implies that 1 ^ ^ (1 + l/d)~^ and 

hence it follows by employing minm<j^m° A| ^ minisgjsgm° Xj = A^o) and Assumption 
A.4 (iii) that 


Lx{eA(^rn°)) ^ Lx{eA^rn°)) ^emiAm° > m°e and 
(1 + l/d)~\eA^rn°))~^[bm “ ^t] ^ (1 + 1/d) (£A(^o)) [b^ - hra%] ^ Tm- (B.2) 

Moreover, we set := 1 ^ inaxm<j^m° and Hm '■= ES'm = + r’m- Introduce the 

event Am ■= {5™ - > -(l/4)(nm + 2 rm)} and its complement Am ■= {Sm - l^m < 

— (l/4)(nm+2rm)}. By employing successively Lemma 3.1, (B.2) and bm° ^ it follows 
now from (B.l) that 


^e°PM\Y{m) ^ Eeo exp ({-(5^ - Pm) - Pm + ^Cx[mt - m]}/ 2 ) +E 0 O 1 , 4 ^ 
^ exp ({-3nm/4 - rm/2 + 3Cx[mi - m]}/ 2 ) + exp ( - (1/64)+ 2rm)) 

^ exp ( - rm/4 + 3Cxm°/2) + exp ( - rm/32) 

, [b„^-<I)?] 3CxLx^l.^ , / [b„^-<!>?] .x 

^ “P ( - 4(1 + l/d)eA,„„ + 32(1 + l/d)eA,„,/) 

, / b^ 2 C'aLa4'?x f LxCx^l^ 

4(1 + l/d)eA(m°) eA(m°) 4eA(m°) 

+ exp (_ ~ _) 

32(l + l/d)£A(^o)^ 


Taking into account the definition (4.4) of , i.e., bm > 8 LaCa( 1 + l/(i)4>? for all 
1 ^ m < G“, and LA<h?(£A(m°))“^ ^ ml due to Assumption A.4 (iii), we obtain 


E 0 opM\Y{m) ^ exp 


LxCx^i 

4^A(mi) 


) + exp ( 


TLA^A^g 
32s A[mO ) 


) ^ 2 exp 


TCx 

32 



Thereby, E6 ioPm|y( 1 ^ M < G J E^opMi Y(m) ^ 2exp( - + logG^) 

using that Gs ^ G~ which proves the assertion (i). Consider now (ii). The claim holds 
trivially true in case G/" = G^, thus suppose G+ < Gg and let Gg ^ m > G+ ^ ml. 
Consider again the upper bound given in (B.l) where now 
™ \2 

E ddw-vtt- 

j=m|+l 

Employing the notations aj and (dj introduced in the proof of (i) and keeping in 
mind Lemma 3.1 we define Vm '■= '^rn ■= Y7^=m°+i^‘j where 1 ^ 
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^ (1 + l/d)~^ due to Assumption A. 2 . Moreover, from Assumption A.4 (i) follows 
maxm°<j^m ^ maxm|<j ^ Ca mini,^j^mi = C'aA^o) and taking into account in 
addition Assumption A.4 (iii) that 

^ rrie, Vm ^ rn — ml and 

CA(£A(mO))-l$? ^ Cx{e\ra%))~^[bm.t “ ^m] ^ Tm- (B.3) 

Moreover, we set := 1 ^ _ Cx[m-ml] +Cx{k(^i)e)~\hm<i - 

bm] ^ IE*S'm = Vm + ?"m- Consider now the event Am ■= {—*S'm — /^m ^ (C'A[m — ml] + 
2Cx{k(miA)~^[^ni% - bm])} and its complement := {-S'™ - fim > {Cx[m - ml] + 

2Cx{k(^miA)~^[^m° — bm])}- By employing successively Lemma 3.1, (B.3) and bm° ^ *h? 
it follows now from (B.l) that 

E^oPmi y("i) ^ Ego exp ({(-S'm - lim) + hm + ^Cx[ml - m]}/2) +E 0 O 
^ exp ({ 2 CA[m - ml] + 3C'A(A(m°)£)“^[bmj - bm] + 3CA[m? - m]}/ 2 ) 

+ exp ( - {CA[m - ml] + 2Cx{\m%)e)~^[bmt - bm]}/9) 

^ exp {{Cx[ml -m] + 3CA(A(m°)e:)“^$?}/2) + exp ( - CA[m - m?]/9) 

^ exp [Cx{-m + 3(A(mj)£)"^<h? + LA(£A(m°))”^<he}/2) 

+ exp ( - Cx{m - LA(eA(m§))"^<h?)/9) 

^ exp (CA{-m + 5LA(A(m°)e:)“^<h?}/2) x exp ( - ) 

^A(m°)£ 

+ exp ( - Cx{m - LA(£A(m°))"^$e)/9) 

Taking into account the definition (4.4) of G+, i.e., m > bLx{ek(^rn%))~^^° for all ^ 
m > Gj", and LA<h?(eA(m°))“^ ^ ml due to Assumption A.4 (iii), we obtain 

LxCx<^G , . 4LaGa4'? 

2 £A(m°) 9£A(m°) 

Thereby, E 0 oPm|y(G+ < M ^ G^) = Z)J=g++i Y(m) ^ 2 exp ( - + log G^) 

which shows the assertion (ii) and completes the proof. □ 

Proof of Lemma 4.2. Consider (i). We start the proof with the observation that 
due to Assumption A.4 (iii) the condition (3.7) holds true with L = Lx uniformly for all 
m G N and £ G (0,1), and hence imposing Assumption A.2 the conditions of Corollary 3.6 
are satished, which in turn setting c := l/(9iC) with K := ((1 + (i“^) V d~‘^\\9° — 9^ II£ 2 )-^-^ 
implies for all 1 ^ m ^ G^ and e G (0, £ 0 ) that 

EeoP^^|Y(||^”^-0°||| > (4 + (ll/2)P)[bmV£mAm]) ^2exp(-m/36); (B.4) 

E0oP^^|y(||^”*- 0°||| < {9(l + l/d)}-'[bmV£mAm]) ^2exp(-m/(162p2)). 

(B.5) 


E 0 opM|Y(m) ^ exp ( 


4Ga 

) ^ 2 exp (- ml). 

9 
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On the other hand side, taking into account the definition (4.4) of and , and the 
monotonicity of and {emAm)m^i we have for all Gj ^ m ^rris that 

emAm ^ em°eAm° ^ and ^ SLxGx{l + 

while for all G^ ^ m ^ m° (keeping in mind Assumption A.5) hold 

m ^ 5Lx{eA(rn°'))~^^° ^ 5Lx{eA(^rn°})~^{K-°)~^em°Am° ^ {bLx/K°)ml ^ D°m°e and 

bra ^ b^o ^ 

where D° := D°{9^,9°, X) := \5Lx/k.°~\. Due to Assumption A.4 (ii) and (hi) it follows 
from m ^ D°m° that A(m) ^ Apo^o) ^ Ai^D°)A^m°) and Am ^ A(m) ^ A(^D°)A(m°) ^ 
A(^Do^LxAm° which in turn implies emAm ^ LxD°A(^£,o^emlAm° ^ LAD°A(£)o)d>? for all 
m ^ Gf. Combining the upper bounds we have (4 + lliC/2)[bm V emAm] ^ K°^° for 
all Gj ^ m ^ since K° ^ (4 + 11A'/2)(8LaCa( 1 + l/d) V LxD°A(^jx°)), and together 
with (B.4) follows 

G+ 

ra=Ge 

Gt _ 

^ E,oP^^|y(||^™- 0°||| > (4+(ll/2)iC)[b^V£mA^]) 

m=GE 

G+ 

^ 2 ^ exp(—m/36) ^ 74exp(—G//36) 

m=G^ 

which proves the assertion (i). Consider now (ii). We observe that by definition (3.10) 
of for all m e hJ holds <l>? ^ [emAm V b^], and hence {9(1 + 1/d)}~\bm V emAm] ^ 
(^o)-i^o gj^(;.g ]^o ^ _|_ x/d). Combining the last estimate, (B.5) and K° ^ lOP it 

follows that 

Gt 

Y, E,oP^^|y(||^™-0°||,', < (P°)-'4>?) 

ra=Ge 

Gt _ 

< E0oP^^|y(||i 9™-0°||| < {9(l + l/d)}-'[b^V£mA^]) 

m=GE 

Gt 

^2 Y exp(-m/(P°)2) ^ A{K°fexp{-G;/{K°)^) 

m=G^ 

which shows the assertion (ii) and completes the proof. □ 
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Proof of Theorem 4.3. We start the proof with the observation that Lemma 4.1 
together with Lemma 4.2 (i) imply 

Ge 

EgoP^M^y{\\'d^-e°\\l > = E,o 

m=l 

^ EgoPjyfi y( 1 ^ M < Gg ) + EgoPj^^i Y(Gj' < M ^ Ge) 

Gt 

+ E,oP^^|y(||^™- 0°||| >p°<l>?) 

m=GJ 

^ 4exp ( — m°{GA/5 — logGe/m?}) + 74exp(—G“/36) (B.6) 
On the other hand side, from Lemma 4.1 together with Lemma 4.2 (ii) also follows that 

E,oP^M|y(||79^-0°||| < {K°)-Hi) ^ E,oPm|y( 1 ^ M < Gj) 

Gt 

+ E0oPm|y(G+<M^G,)+E,o ^ pM|Y(m)P^^|Y(||^™-0°|ll < 

m=Gs 

^ 4exp(-m?{GA/5-logG,/m?}) + 4{K°y exp{-Gt /{K°f) (B.7) 

By combining (B.6) and (B.7) we obtain the assertion of the theorem since Gj,m° —)■ cx) 
and logGg/m? = o(l) as e —)■ 0 which completes the proof. □ 

B.2 Proof of Theorem 4.4 

The next assertion presents a concentration inequality for Gaussian random variables. 
Lemma B.l. Let the assumptions of Lemma 3.1 he satisfied. For all 0 we have 

sup(6t^)~^exp - ESm - ^c{vm + 2r^)^ ^1 (B.8) 

m^l V ^ / + 

where (a)+ := (a V 0). 

Proof of Lemma B.l. The assertion follows from Lemma 3.1 (keeping in mind that 
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c ^ 1), indeed 


EiSm- ESm - -c{Vm + 2r^) = / P{Sm “ ESm > X + -c{Vm + 2rm))dx 


P{Sm - IE5'm ^ 2 ^ ‘^rm))dx 


< 


exp 


exp 


{2x/{3{vm + 2rm)) + c){vm + 2r^ 
4tm 

2x/3 + c{vm + 2r„ 


dx 


^tr. 


dx 


= exp 


c{Vm + 2rm) 

4:tm. 


exp 


X 

Qtrr 


dx = exp 


c{vm + 2r^) 
4tm. 


{6tr 


□ 


Lemma B.2. If Assumption A.2 and A.f hold true then for all e G (0,eo) 

(i) -A,9')n<|Y(j < M < G.)}" 

^ eGfAQ+ + lOAi exp ( — m?/5 + 2 logGe) ; 

(ii) Yl^=lidj - ^j°)^E0oEm| y{ 1{1 ^ M <i}+(crjAi)^ l{i ^ Msg Gj} + 

^ bGj + ~ d°\\j^{d~^eA^Q-s^ + 2exp ( - mi/b + logGe)}. 

Proof of Lemma B.2. Consider ( i ). We start with the observation that the random 
variables := e~^^‘^(Yj —^jOj)}j^i are independent and standard normally distributed. 
Moreover, applying Jensen’s inequality we have 


{^jPulvU ^ M ^ G^)Y — {EmI l{j ^ M ^ Ge}Y ^ Em| yC^I l{j s£ M s£ Ge} 

We split the sum into two parts which we bound separately. Precisely, 

Ge 

i=i 

Gt G, 

« Y + E 'jYjdi’M I y(G+ < M ^ G.) (B.9) 

i=i i=i 

where we used that aj ^ sAj. Keeping in mind the notations used in Lemma B.l 
let Sc^ ■= observe that aj = 0 and /3| = eAj, and hence = 0. 

Keeping in mind that Gs ■= max{l ^ m ^ • ^^(m) ^ Ai} we set tc^ ■= Ai ^ 

eA(G^) = maxi^j^G, /3] and vq, ■= AiG^ = Getc^ > where E^oS' g^ ^ vq,- From 
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Lemma B.l with c = 2/3 follows that Kgo^Sc^ — 2KiG^)j^ ^ (S^gJ exp(—= 
(6Ai) exp(—Ge/6), and hence 

Ge 

5^£A,E,o{eJPM|Y(G'+ < M ^ G,)} 
i=i 

^ E(5g, - 2AiG,)^ + 2 AiG,E,oPm| y(G+ < M ^ G,) 

^ 6Ai exp(—Gg/6) + 2AiG£E0oP]y[| y(G/' < M ^ G^). (B.IO) 

We distinguish two cases. First, if Gf = G^, then assertion (i) follows by combining 
(B.9) and E,go^j = 1. Second, if Gf < G^, then the definition (4.4) of Gf implies G^ > 
5m? which in turn implies the assertion (i) by combining (B.9), Ego^l = 1, (B.IO) and 
Lemma 4.1 (ii). Consider (ii). Due to Assumption A.2 we have {o'j/<^j)‘^ ^ {1 Ad~^eAj) 
which we will use without further reference. Splitting the first sum into two parts we 
obtain 

Ge 

Y.(A -<M<i)lo<M<G.)} + 5^(»/ - 0°? 

i=i i>G, 

G7 

^ ~ 6'°)^E6)o{1|i M < j}+d ^^Aj} 

j=i 

Ge 

+ (^/ - 6*°)^Ego{l{i ^ M < j} + l{j ^ Mig Gg}} + ~ 

j=G^+l j>Ge 

^ lir - 0°||2 JE,oPm|y(1 ^ M < G;) + d-"£A(^-)} + (0/ - e°f 

j>G7 

The assertion (ii) follows now by combining the last estimate and Lemma 4.1 (i), which 
completes the proof. □ 

Proof of Theorem 4.4. We start the proof with the observation that 6j — = 

( 07 - 0 °)Pm|y(j ^ M ^ g,) + (0/-0°)Pm|y( 1 ^ M < j)} for all 1 < j ^ G, and 6,-6° = 
6^ —6° for all j > G^. From the identity 6j — 6° = {o'j/7j){6^ —6°) + {ajXje~^){Yj —Xj6°) 
and Lemma B.2 follows that 

G, 

^B40-eX, ^ 572a2A|£-Xo{(Y, -A,0;)PM|Y(j ^ M ^ G,) 

j=i 

Ge 

+^2(0/-0°)2E,o{(a,/9)PM|Y(j ^ M ^ G,)+Pm|y( 1 ^ M < j)F+ 

J=1 3>Ge 

^ 2{eG^K q+ + lOAi exp ( — m?/5 + 2 log G^)} 

+ 2{b(j- + - 0°|||{d“^eA(c-) + 2exp ( - m?/5 + logG^)}}. 
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On the other hand side, taking into account the dehnition (4.4) of Gj and G^, we 
have show in the proof of Lemma 4.2 that b^- ^ 8LxG\{l + l/d)<h? and eG^AQ+ ^ 
while trivially ^ eA(^m°) ^ 4)£. By combination of these estimates 

we obtain 


Eeo||0 - e°\\l ^ {2LaD°A(b.) + i 6 LaC'a(i + 1/d) + 2d-^\\e^ - e°\\l}<i>i 

+ (20Ai +A\\e^ - 6»°|||)exp ( - m?/5 + 21ogG'£ - log<h?)}<h? 

From the last bound follows the assertion of the theorem since (2 log — log ) / m° —?• 0 
as £ —)■ 0 which completes the proof. □ 


B.3 Proof of Theorem 4.6 


Proof of Lemma 4.5. The proof follows along the lines of the proof of Lemma 4.1, 
where we replace Gj, G+, mi and by its counterpart G*-, G*+, mt and 4>J, respec¬ 
tively. Moreover, we will use without further reference, that for all 6° G 0^ the bias is 
bound by bm ^ ram, for all m eN, and hence bm* ^ (1 V r)<Fj. 

Consider (i). The claim holds trivially true in case G*~ = 1, thus suppose G*~ > 1 
and let 1 ^ m < G*- ^ mt. Dehne Sm ■= \\0^^ ~ ||^ — Hd™ — 6^ ||^. Let Am and M)/, 

respectively, be an event and its complement dehned as in the Proof of Lemma 4.1, then 
it follows 


PM| Yl' 


(B.ll) 


where = Ejlm+i {Yj — XjO^Y. We use the notation introduced in Lemma 
4.1, where again 1 ^ ^ (1 -|- l/d)“^ due to Assumption A.2 and by employing 

minm<j^mj ^ together with Assumption A.4 (iii) 


Lx{eA(^mt)) r)^t ^ Lx{eA(^mt)) V/njA^j ^ m* and 

(1 -h l/d)~^{eA(mt))~^[bm - (1 V r)<Fj] ^ (1 -h l/d)~^{eA(mt))~^[bm - bmj] ^ 

(B.12) 

By employing successively Lemma 3.1, (B.12) and bm* ^ (1 V r)<l)J for all 6° G ©„ it 
follows now from (B.ll) that 


^9°PM\Y{rn) ^ exp 


2GaLa( 1 Vr)<F^ 


A(1-\-1/d)eA[mi) ^A 


(mj) 


) xexp 


LaGa( 1 V r)<F^ 


4£A 


(mt) 


) 


[bm - (1 V r)$;] . 

?.2{l + l/d)eA^mt)> 
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Taking into account the definition (4.5) of i.e., > 8 LxCx{l + l/d){l V r)<hj for 

all 1 ^ m < and ^ mt due to Assumption A.4 (iii), we obtain 


E^oPmi y(? 


Thereby, E0 oPm|y(1 < M < G*-) = ^ 2exp( - + 

logGe) using that ^ G*- which proves the assertion (i). Consider now (ii). The 
claim holds trivially true in case G*+ = G^, thus suppose G*+ < Gs and let G^ ^ m > 
G*+ ^ mt. Consider the upper bound (B.ll) where —Sm = ~^0^j ~ 

Employing the notations introduced in the Proof of Lemma 4.1 where we had 1 ^ 
^ (1 + l/d)~^ due to Assumption A.2, we obtain from Assumption A.4 (i) that 
maxmj<jsgm Aj ^ C'aA))^*^ and taking into account in addition Assumption A.4 (iii) that 


La(£A(^*)) ^(lVr)<hj 


I J ^ ^ ' ' t/£ ^ ^TTl ^ t ! fj£ CXllLJ. 

CA(eA(mj))"^(l V r)4>J ^ Gx{£j^{nit))~^[^mt - bm] ^ Tra- (B.13) 


By employing successively Lemma 3.1, (B.13) and bm* ^ (1 V r)<l>? it follows now from 
(B.ll) that 


EeoPMi Y(m) ^ exp [Gx{-m+5Lx{A(^mt)£) ^(lVr)<l)j}/2) xexp 

+ exp ( - Gx{m - Lx{eA(^rat))~^0- V r)^t)/9) 


) 


Taking into account the definition (4.5) of G*+, i.e., m > 5LA(£A(m*))~^(l V r)<l)J for all 
Ge ^ m > G*+, and La( 1 V r)<l'j(eA(mj))“^ ^ (1 V r)mt due to Assumption A.4 (iii), we 
obtain 


E 0 opu\Y{m) ^ exp 


LxGx{l V r)<l)^ 


2 eA 


(mt) 


)+exp (- 


4LAC'A(lVr)4>^ 


QeA 


(mt) 


) ^ 2exp (- 


4C'a( 1 Vr) 


9 


-ml 


Thereby, EgoPM\Y{G*+ <M^G^) = | y(H ^ 2exp ( - + 

logGe) which shows the assertion (ii) and completes the proof. □ 


Proof of Theorem 4.6. We start the proof with the observation that due to As¬ 
sumption A.4 (iii) the condition (3.7) holds true with L = Lx uniformly for all m G N 
and £ G (0,1), and hence imposing Assumption A.2 the conditions of Corollary 3.6 
(3.8) are satisfied, which in turn implies, by setting K := ((1 -|- 1/d) V r/d^)Lx ^ 
((1 -I- d~^) V d~‘^\\ 6 ° — 9^\\‘l^)Lx., that for all 1 ^ m ^ and e G (0, e*) 


EgoP^m^yi^W'd^—9°\\‘)^ > (4-f (ll/2)iL)[bm V emArn]) ^ 2exp(—m/36). (B.14) 
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Moreover, exploiting the inequality below (A.3) with ci = 1/3 and C 2 ^ 1, it is possible 
to prove a slightly modihed version of Corollary 3.6 (3.8) which implies for all C 2 ^ 1 

IE0oP-^-|y(||'* 9”*> 16c2iC[bm V emArn]) ^ 2 exp(-C2m/12). (B.15) 

Consider (i). Following line by line the proof of Lemma 4.2 (i), using (B.14) rather 
than (B.4) and exploiting [b^VernAm] ^ SLxCx{l + l/d){l V r)4>J for all G*- ^ m ^ mt 
and [bm V emAm] ^ La/1*A(o*)( 1 V r)<Fj with D* := [SLa/k*] for all mt ^ m ^ G*+ 
(keep in mind that m ^ D*mt) , we obtain 


G*+ 

m=Ge~ 


< 


gI+ 

Eg/oP^jmlY 

m=Gs~ 


> {A+{ll/2)K)[b^yemAj) 
gI+ 

^ 2 ^ exp(-m/36) ^ 74exp(-G'/-/36). 

m=Gs~ 


Combining the last estimate, Lemma 4.5 and the decomposition (B.6) used in the proof 
of Theorem 4.3 (with G~ and G/" replaced by G*~, G*+) it follows that 


^ 4exp ( — mj{GA/5 — logGe/mj}) + 74exp(—G*~/36) (B.16) 

Taking into account that mt —)■ 00 and \ogG^/mt = o(l) as e —)■ 0, we obtain the 
assertion ( i ) of the Theorem for any 9° G such that G*- —)■ cxd as e —)■ 0. On the 
other hand side, if 6 ° G Og such that G*" -jA 00 , i.e., sup^G/" < cx), then there exists 
So G (0,1) such that G*y = G*“ for all £ G (0,£o) (keep in mind that (G/")^ is an 
integer-valued monotonically increasing sequence). Moreover, by construction b^*- ^ 
8 LaGa( 1 -I- l/d)(l V r)<Fj for all e G (0,£o) which in turn implies b^ ^ b^*- = 0 for all 
m ^ G*y, since = o(l) as e —)■ 0. Thereby, for all m ^ G*" follows <Fj/[bmV£mAm] = 
^t/[emAm] ^ [emtAmt\/[£rnAm\ ^ mt/[Lxm] using that LaA^* ^ A(^rnt) > \m) > 
Ara due to Assumption A.4 (iii), which in turn together with P*<l)j/[bm V emA^] ^ 
K*mt/[Lxm] = I 6 C 2 K, C 2 := (8Ga(1 + 1/d) V P*A(/)*))(1 V r)mj/m ^ 1 and (B.15) 
implies 

Gp 

^ E0oP^m^y{\\^-^-9°\\l > P*<F*) 

m=Ge~ 

P 2exp(—(8 Ga( 1 + 1/d) V P*A(o*))(l V r)mj/12 -h logG^) 

P 2exp(—GAmj/5-f logGe). 
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Consequently, we have 


EeoP^M^y{\\i9^ -e°\\l >K*^t) ^ 6exp {- mt{Cx/5 -\ogG,/mt}) 

which shows that assertion (i) holds for any 6° G 0„ since mt —)■ oo and log /mi = 
o(l) as e —)■ 0. Consider (ii). Employing that for all 9° G ©„ it holds K*^i ^ 
16i^[bm V emAm] for all G/- ^ m ^ G/+ it follows that K^^i/\/0m V emA^] ^ 1602-?^ 
where C 2 := K^/K* ^ 12 for all e G (0,?*) since —>■ cx) as e —)■ 0. Therefore, by 

applying (B.15) we have 

gI+ 

Y, -ex, > K,^i) ^ 4exp(-K,/[12ir*]). 

m=Gp 

and hence from Lemma 4.5 follows for all £ ^ (e* A £*) 

E,oP^M|y(||79^-0°||| 

^ 4exp ( — mi{Gx/5 — logGe/mi}) + 4exp(—iC£/[12iC*]). 

Observe, that (e* A £*) depends only on the class and thus the upper bound given 
in the last display holds true uniformly for all 0° G which implies the assertion (ii) 
by using that ^ oo, mi ^ oo and logGe/me = o(l) as £ —>■ 0, and completes the 
proof. □ 

B.4 Proof of Theorem 4.7 

Lemma B.3. If Assumption A.2 and A.f hold true then for all 9° G 0^ and e G (0,£o) 

(i) ESi ffjAy-%.{(Yj -Aj«”)PM|Y(i < M S_G,)}2 

^ eG*+A^*+ + lOAi exp ( — me /5 + 2 log G^/); 

(h) y{1{i ^ M < j} g'a} + 

^ b^*- + ||0x - eXAd-^Xcl-) + 2exp ( - ^^mf + logG,)}. 

Proof of Lemma B.3. The proof follows along the lines of the proof of Lemma B.2, 
where we replace Gf, Gf, ml and <l>? by its counterpart G*-, G/+, mi and $?, respec¬ 
tively. 

Consider (i). Following the proof of (B.9) it is straightforward to see that 

Z -A,9°)PM|Y(j < M < G,)y 

i=i 

GI+ Ge 

< + 5^£A,eJPM|Y(Gr < M ^ G,) (B.17) 

j=i i=i 
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and following line by line the proof of (B.IO) we conclnde 

Ge 

5^£A,E,o{eJPM|Y(G':+ < M ^ G,)} 
i=i 

^ 6Aiexp(-G£/6) +2AiG^E0oPM|Y(G'r < M ^ G,). (B.18) 

We distingnish two cases. First, if G*+ = G^, then assertion (i) follows by combining 
(B.17) and E^o^^ = 1. Second, if G*+ < G^, then the definition (4.5) of G*+ implies 
Gs > 5m? which in tnrn implies the assertion (i) by combining (B.17), E^o^l = 1, (B.18) 
and Lemma 4.5 (i). Consider (ii). Following the proof of Lemma B.2 (ii) we obtain 

G, 

- ^j)^E6)o{l{i ^M<j} +(c^jAi)^ l{i ^ M Sg Gj} + ^ {Oj - Oj)'^ 
i=l j>Ge 

< 119" - 9“|||{E„.Fm|y(1 < M < G*-) + + Y, (Oj " 

j>Gl- 

The assertion (ii) follows now by combining the last estimate and Lemma 4.5 (ii), 
which completes the proof. □ 

Proof of Theorem 4.7. The proof follows line by line the proof of Theorem 4.4 nsing 
Lemma B.3 rather than Lemma B.2, more precisely from Lemma B.3 follows 

G, 

^04O-O°\\l ^ 5^2a2A|£-Xo{(Y, -A,0 °)Pm|y(j ^ M ^ G,) 
i=i 

Ge 

+5^2(0;-0°)Xo{(a,/<^,)PM|Y(j ^ M ^ G,)+Pm|y(1 ^ M < j)y+Y,iO^-0]? 

i=i j>G, 

^ 2{£G*+A„*+ + lOAi exp ( - ml + 2 log G^)} 

+ 2{bg*- + + 2exp (--^m? + logGe)}}. 

Taking fnrther into acconnt the definition (4.5) of G*" and G*+, we have ^ 

8 L\Gx{l + l/d)(l V r)<F? and (keeping in mind Assnmption A.3) G*+ ^ D*mt with 
D* ■= P*(0(j, A) := |'5 La( 1 V r)/^], which in tnrn implies £G*+A^*+ ^ LaP*A(£)*)<F?, 
while trivially ^ eA(mj) ^ 4)? and 116^^ — 9°\\j^ ^ r. By combination of these 

estimates we obtain nniformly for all 6 ° G ©„ that 

E0o||0 - 9°\\l ^ {2LaP*A(^o + 16LaGa(1 + l/d)(l V r) + 2d-V}4>? 

+ (20Ai + 4r)exp ( - ^ mt + 21ogGe - logT?)}*!*?. 
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Note that in the last display the multiplicative factors of depend only on the class 

0„, the constant d and the sequence A. Thereby, the assertion of the theorem follows 

from log{Gs/^t)/mt —)■ 0 as e —)■ 0 which completes the proof. □ 
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